AITopics | function similarity

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsAug-22-2025, 01:43:24 GMT

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

This is, e.g., the typical setting of many ERM

artificial intelligence, complexity, machine learning, (15 more...)

Country:

Asia > Russia (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Neural Information Processing SystemsAug-14-2025, 06:28:24 GMT

44e65d3e9bc2f88b2b3d566de51a5381-Paper.pdf

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

Asia > Russia (0.04)
North America > United States > New York (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Zhang, Junhui, Jaillet, Patrick

Multi-Timescale Gradient Sliding for Distributed Optimization

arXiv.org Artificial IntelligenceJun-19-2025

We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates. To find an $ε$-suboptimal solution, the complexities of our algorithms achieve optimal dependency on $ε$: MT-GS needs $O(\overline{r}A/ε)$ communication rounds and $O(\overline{r}/ε^2)$ subgradient steps for Lipchitz objectives, and AMT-GS needs $O(\overline{r}A/\sqrt{εμ})$ communication rounds and $O(\overline{r}/(εμ))$ subgradient steps if the objectives are also $μ$-strongly convex. Here, $\overline{r}$ measures the ``average rate of updates'' for dual blocks, and $A$ measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on $A$ is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).

artificial intelligence, function similarity, machine learning, (16 more...)

2506.15387

Country:

Europe > France > Hauts-de-France > Nord > Lille (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)

Cao, Tianyu, Chen, Xiaokai, Scutari, Gesualdo

DCatalyst: A Unified Accelerated Framework for Decentralized Optimization

arXiv.org Artificial IntelligenceJan-29-2025

We study decentralized optimization over a network of agents, modeled as graphs, with no central server. The goal is to minimize $f+r$, where $f$ represents a (strongly) convex function averaging the local agents' losses, and $r$ is a convex, extended-value function. We introduce DCatalyst, a unified black-box framework that integrates Nesterov acceleration into decentralized optimization algorithms. %, enhancing their performance. At its core, DCatalyst operates as an \textit{inexact}, \textit{momentum-accelerated} proximal method (forming the outer loop) that seamlessly incorporates any selected decentralized algorithm (as the inner loop). We demonstrate that DCatalyst achieves optimal communication and computational complexity (up to log-factors) across various decentralized algorithms and problem instances. Notably, it extends acceleration capabilities to problem classes previously lacking accelerated solution methods, thereby broadening the effectiveness of decentralized methods. On the technical side, our framework introduce the {\it inexact estimating sequences}--a novel extension of the well-known Nesterov's estimating sequences, tailored for the minimization of composite losses in decentralized settings. This method adeptly handles consensus errors and inexact solutions of agents' subproblems, challenges not addressed by existing models.

algorithm, artificial intelligence, machine learning, (18 more...)

2501.18114

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Neural Information Processing SystemsJan-19-2025, 01:04:36 GMT

Optimal Gradient Sliding and its Application to Optimal Distributed Optimization Under Similarity

We study structured convex optimization problems, with additive objective r: p q, where r is ( \mu -strongly) convex, q is L_q -smooth and convex, and p is L_p -smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of p and q, that is, \mathcal{O}(\sqrt{L_p/\mu}) and \mathcal{O}(\sqrt{L_q/\mu}), respectively. This result is much sharper than the classic black-box complexity \mathcal{O}(\sqrt{(L_p L_q)/\mu}), especially when the difference between L_p and L_q is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on both communication and local gradient calls, with the former having being a long-standing open problem.

application, optimal gradient sliding, optimization, (8 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Gruntkowska, Kaja, Tyurin, Alexander, Richtárik, Peter

Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity

arXiv.org Artificial IntelligenceFeb-9-2024

Effective communication between the server and workers plays a key role in distributed optimization. In this paper, we focus on optimizing the server-to-worker communication, uncovering inefficiencies in prevalent downlink compression approaches. Considering first the pure setup where the uplink communication costs are negligible, we introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors. Theoretical analyses demonstrates that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers, thus being provably superior to existing algorithms. We further show that MARINA-P can serve as a starting point for extensions such as methods supporting bidirectional compression. We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases. Theoretical findings align closely with empirical experiments, underscoring the efficiency of the proposed algorithms.

artificial intelligence, machine learning, marina-p, (12 more...)

2402.06412

Country: Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report > New Finding (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)

Mengin, Elie, Rossi, Fabrice

Binary Diffing as a Network Alignment Problem via Belief Propagation

arXiv.org Artificial IntelligenceDec-31-2021

In this paper, we address the problem of finding a correspondence, or matching, between the functions of two programs in binary form, which is one of the most common task in binary diffing. We introduce a new formulation of this problem as a particular instance of a graph edit problem over the call graphs of the programs. In this formulation, the quality of a mapping is evaluated simultaneously with respect to both function content and call graph similarities. We show that this formulation is equivalent to a network alignment problem. We propose a solving strategy for this problem based on max-product belief propagation. Finally, we implement a prototype of our method, called QBinDiff, and propose an extensive evaluation which shows that our approach outperforms state of the art diffing tools.

artificial intelligence, machine learning, natural language, (20 more...)

2112.15337

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.60)